Selecting Multiway Splits in Decision Trees
نویسندگان
چکیده
Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root to leaf. There are efficient algorithms for finding the optimal multiway split for a numeric attribute, given the number of intervals in which it is to be divided. The problem we tackle is how to choose this number in order to obtain small, accurate trees. We view each multiway decision as a model and a decision tree as a recursive structure of such models. Standard methods of choosing between competing models include resampling techniques (such as cross-validation, holdout, or bootstrap) for estimating the classification error; and minimum description length techniques. However, the recursive situation differs from the usual one, and may call for new model selection methods. This paper introduces a new criterion for model selection: a resampling estimate of the information gain. Empirical results are presented for building multiway decision trees using this new criterion, and compared with criteria adopted by previous authors. The new method generates multiway trees that are both smaller and more accurate than those produced previously, and their performance is comparable with standard binary decision trees.
منابع مشابه
Classification Trees With Unbiased Multiway Splits
Two univariate split methods and one linear combination split method are proposed for the construction of classification trees with multiway splits. Examples are given where the trees are more compact and hence easier to interpret than binary trees. A major strength of the univariate split methods is that they have negligible bias in variable selection, both when the variables differ in the num...
متن کاملSelecting the best categorical split for classification trees
Based on a family of splitting criteria for classification trees, methods of selecting the best categorical splits are studied. They are shown to be very useful in reducing the computational complexity of the exhaustive search method. Keyword: classification tree; power divergence; splitting criteria
متن کاملSplit Selection Methods for Classification Trees
Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected...
متن کاملCanonical Correlation Forests
We introduce canonical correlation forests (CCFs), a new decision tree ensemble method for classification. Individual canonical correlation trees are binary decision trees with hyperplane splits based on canonical correlation components. Unlike axisaligned alternatives, the decision surfaces of CCFs are not restricted to the coordinate system of the input features and therefore more naturally r...
متن کامل